VOLVER

Curso 3 Getting and Cleaning Data (Datawrangling)

Raw data VS. Processed Data

qownnotes-media-ZO1784 qownnotes-media-uX1784

qownnotes-media-qd1784

qownnotes-media-qd1784

qownnotes-media-uV1784 CodeBook ————- qownnotes-media-MW1784

qownnotes-media-oq1784

qownnotes-media-oq1784

Ficheros

Un primer caso habitual en la recogida de datos, será descargar de internet dicho fichero de datos. A continuación se describe el mejor proceso para hacerlo:

qownnotes-media-Bj1784

qownnotes-media-Bj1784

qownnotes-media-jX1784

qownnotes-media-jX1784

qownnotes-media-fa1784

qownnotes-media-fa1784

qownnotes-media-mC1784

qownnotes-media-mC1784

qownnotes-media-fX1784

qownnotes-media-fX1784

qownnotes-media-Wu1784

qownnotes-media-Wu1784

qownnotes-media-sQ1784

qownnotes-media-sQ1784

qownnotes-media-XC1784

qownnotes-media-XC1784

qownnotes-media-Is1784

qownnotes-media-Is1784

qownnotes-media-OV1784

qownnotes-media-OV1784

Leer XML

qownnotes-media-cc1784

qownnotes-media-cc1784

qownnotes-media-jb1784

qownnotes-media-jb1784

qownnotes-media-Gj1784

qownnotes-media-Gj1784

qownnotes-media-lK1784

qownnotes-media-lK1784

qownnotes-media-cN1784

qownnotes-media-cN1784

Other Example

qownnotes-media-dB1784

qownnotes-media-dB1784

Leer JSON

qownnotes-media-Zt1784

qownnotes-media-Zt1784

qownnotes-media-Eu1784

qownnotes-media-Eu1784

qownnotes-media-Kl1784

qownnotes-media-Kl1784

qownnotes-media-wn1784

qownnotes-media-wn1784

DATA.TABLE

Es una versión más rápida y eficiente que los data.frames en algunas ocasiones.

qownnotes-media-mH5784

qownnotes-media-mH5784

qownnotes-media-td5784

qownnotes-media-td5784

qownnotes-media-Ns5784

qownnotes-media-Ns5784

qownnotes-media-xr5784

qownnotes-media-xr5784

qownnotes-media-bu5784

qownnotes-media-bu5784

qownnotes-media-HL5784

qownnotes-media-HL5784

qownnotes-media-Mc5784

qownnotes-media-Mc5784

qownnotes-media-RF5784

qownnotes-media-RF5784

qownnotes-media-dl5784

qownnotes-media-dl5784

qownnotes-media-LS5784

qownnotes-media-LS5784

qownnotes-media-Iz5784

qownnotes-media-Iz5784

Una manera de hacer subconjuntos muy rápido es crear claves

qownnotes-media-Ia5784

qownnotes-media-Ia5784

qownnotes-media-eV5784

qownnotes-media-eV5784

qownnotes-media-xN5784

qownnotes-media-xN5784

https://stackoverflow.com/questions/13618488/what-you-can-do-with-data-frame-that-you-cant-in-data-table

Acceso a base de datos (mySQL)

qownnotes-media-mC5784

qownnotes-media-mC5784

qownnotes-media-Nw5784

qownnotes-media-Nw5784

qownnotes-media-af5784

qownnotes-media-af5784

qownnotes-media-tc5784

qownnotes-media-tc5784

qownnotes-media-Wf5784

qownnotes-media-Wf5784

qownnotes-media-cO5784

qownnotes-media-cO5784

Acceos a HDF5

qownnotes-media-CC5784

qownnotes-media-CC5784

qownnotes-media-ZB5784

qownnotes-media-ZB5784

qownnotes-media-aO5784

qownnotes-media-aO5784

qownnotes-media-zd5784

qownnotes-media-zd5784

qownnotes-media-bP5784

qownnotes-media-bP5784

qownnotes-media-Wn5784

qownnotes-media-Wn5784

qownnotes-media-Uc5784 Webscraping ————- qownnotes-media-VN5784

qownnotes-media-zT5784

qownnotes-media-zT5784

qownnotes-media-Cf5784

qownnotes-media-Cf5784

qownnotes-media-pX5784

qownnotes-media-pX5784

qownnotes-media-hM5784

qownnotes-media-hM5784

qownnotes-media-XE5784

qownnotes-media-XE5784

qownnotes-media-UB5784

qownnotes-media-UB5784

qownnotes-media-ec5784

qownnotes-media-ec5784

qownnotes-media-Oz5784

qownnotes-media-Oz5784

Otros recursos

qownnotes-media-it5784

qownnotes-media-it5784

qownnotes-media-kU5784

qownnotes-media-kU5784

qownnotes-media-Ei5784

qownnotes-media-Ei5784

qownnotes-media-gL5784

qownnotes-media-gL5784

DPLYR VS DATAFRAME

CREAR COLUMNA

Dataframe

data$nuevoNombre <- ifelse(data$name=='TORNADO';'TOR';data$name)

Dplyr

data %>%
    mutate(nuevoNombre =  ifelse(data$name=='TORNADO';'TOR';data$name))

msleep %>%
    mutate(rem_proportion = sleep_rem / sleep_total)

AGRUPAR

Agregate

INJ <- aggregate(INJURIES ~ EVTYPE, data, FUN = sum)

Data table

DT <- data.table(data)
INJ <- DT[,sum(INJURIES),by=EVTYPE]

Dplyr

INJ <-  data %>% groupby(EVTYPE)%>%summarise(total.sum=sum(INJURIES)

ORDENAR

DataFrame

INJ <- INJ[order(-INJ$V1),]

Dplyr

INJ <- arrange(INJ, desc(V1))

# grouped arrange ignores group
INJ %>% arrange(desc(V1))
# Unless you specifically ask:
INJ %>% arrange(desc(V1), .bygroup = TRUE)

SELECCIONAR

Dataframe


event <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG",
           "CROPDMGEXP")
data <- storm[event]

dplyr

data <- select(storm, EVTYPE, FATALITIES,INJURIES,PROPDMG)

FILTRAR

Dataframe

data$CROPEXP[data$CROPDMGEXP == "2"]

dplyr

filter(data, CROPDMGEXP >= 16)

Miguel Angel Huerta

16 de octubre de 2018